Mastering Large-Scale Databases with Postgres

pgconf.de
2023-05-27

Image by Sasin Tipchai from Pixabay
logo EDB

Who am I

Image by Anemone123 from Pixabay

Mastering Large-Scale Databases with Postgres

  • What is a Very Large Database?
  • What kind of problems?
  • Successful Postgres users
  • Some general ideas
Image by Sasin Tipchai from Pixabay
logo EDB

What is a Very Large Database?

  • Very Large Databases (VLDB)
  • Very Busy Databases (VBDB)
Image by Gerd Altmann from Pixabay
logo EDB
“if your database is under 100Gb, it's adorable.”

April 2023, Malta, Christophe Pettus

logo EDB

What is a Very Large Database? (Also called VLDB)

“There is no absolute amount of data that can be cited. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data.
Image by 499585 from Pixabay
logo EDB

What is a Very Large Database? (Also called VLDB)

VLDB issues may start to appear when 1 TB is approached, and are more than likely to have appeared as 30 TB or so is exceeded.”

Very large database. 2021, October 16 Wikipedia.

Image by 499585 from Pixabay
logo EDB

What is a Very Busy Database? (Also called VBDB)

  • Relative
  • During some quite long periods the hardware (storage, CPU, RAM) is "too" busy and databases operations are piling up, waiting for resources
Image by Brian Merrill from Pixabay
logo EDB

The vicious circles of high workloads

Image by MPMPix from Pixabay
logo EDB

The vicious circles of high workloads

Image by MPMPix from Pixabay
logo EDB

The vicious circles of high workloads

Image by MPMPix from Pixabay
logo EDB

The vicious circles of high workloads

Image by MPMPix from Pixabay
logo EDB

The vicious circles of high workloads

Image by MPMPix from Pixabay
logo EDB

Problems

  • Backups, Recovery
  • Postgres limits
  • Performance issues
Image by Ryan McGuire from Pixabay
logo EDB

Backups and recovery

It's all about RTO and RPO

Image by tookapic from Pixabay
logo EDB

Backups/Recovery time

  • Lots of IO
    • IO bottleneck
    • RAM eviction for interesting data
    • slower queries
  • Recovery time will be longer too
Image by Nile from Pixabay
logo EDB

Backups/Recovery solutions

  • Pgbackrest (delta mode, block-level incremental backups)
  • VM/storage snapshots
  • Streaming replication (with or without delayed apply)
Image by succo from Pixabay
logo EDB

Postgres limits (tables)

  • 32 max for a table
  • No more tupples than what fits into 4,294,967,295 pages

→ Data retention policy

→ Partitioning

→ Archiving

→ Sharding

Image by Werner Moser from Pixabay
logo EDB

Archiving

  • Put data on another server + FDW (on demand)
  • Dum/Restore data on demand
  • Export data in csv files + FDW (on demand)
Image by Pexels from Pixabay
logo EDB

Postgres limits (Columns/Fields)

  • No more than 1600 columns per table
  • No more than 32 columns in an index
  • No more than 1GB per field

→ Reconsider the data model

Image by KBCH from Pixabay
logo EDB

Postgres limits (Databases)

  • 1,431,650,303 max relations per database
  • 4,294,950,911 max databases per cluster
  • 4,294,967,295 max transaction numbers per cluster (without vacuum freeze)

→ Reconsider the architecture

Image by Dean Moriarty from Pixabay
logo EDB

Sucessful Postgres usages

  • Big companies have big databases
  • Big companies do not like sharing data about their data
Image by Sasin Tipchai from Pixabay
logo EDB

Use case: highly transactional DB with more than 400TB

  • Banking transactional system
  • Very knowledgeable team
  • Data retention policy in place
Image by Gerd Altmann from Pixabay
logo EDB

Use case: highly transactional DB with more than 400TB

  • Trouble with autovacuum
  • Trouble with big tables
  • Trouble with data wraparound
Image by Gerd Altmann from Pixabay
logo EDB

Use case: World Largest Tax Filling System

  • 60+ Million users in 2022
  • 7.24 M new records/day
  • Only 45 TB
logo EDB

Use case: highly transactional DB with more than 400TB

  • Trouble with autovacuum
  • Trouble with big table (22TB)
  • Trouble with high workload
logo EDB

Use case: Real-time Payment System

  • 40 k + transactions per second
  • 6.7 Billions (banking) transactions a month
  • 2 TB + OLTP and 5 TB + OLAP
Image by Ahmad Ardity from Pixabay
logo EDB

Use case: highly transactional DB with more than 400TB

  • Trouble with autovacuum
  • Near 0 dataloss requirement
  • Trouble with high workload
Image by Ahmad Ardity from Pixabay
logo EDB

Some general ideas

Image by Colin Behrens from Pixabay
logo EDB

Do you need the data?

  • Data retention policy
  • Legals (GDPR…)
  • Archiving
    • On other partitions/tables
    • On other storage (tablespaces)
    • On other clusters (FDW)
    • Offline
Image by Gerd Altmann from Pixabay
logo EDB

Storage

  • Use the best you can afford
  • Separate Postgres logs from PGDATA
  • Separate pg_wal from PGDATA
  • Bench before separating indexes from tables
  • Use several storage depending on the tables
Image by Pexels from Pixabay
logo EDB

RAM

  • Take what you can afford
  • Bench before increasing shared_buffers
  • Give more RAM to autovacuum
logo EDB

CPU

  • You'll need more if data_checksums is enabled (recommended)
  • Encrypted data will use more CPU
  • Compressed backups (on the fly) might take more CPU
  • Compressed WALs (on the fly) might take more CPU
logo EDB

Autovacuum

  • It's your friend, not your enemy!
  • Tune it in postgresql.conf
  • Give it more resources
  • Tune it also for specific tables
logo EDB

Performance

Data is bloated
    Before anything, tune autovacuum
  • More usable workers
  • More usable RAM
  • Less naptime
  • Adjust thresholds
Image by ktphotography from Pixabay
logo EDB

Performance

Data is bloated
    ... Then remove the bloat!
  • Vacuum full (stop the app!)
  • export/import (stop the app!)
  • Logical replication
Image by Pexels from Pixabay
logo EDB

Performance

indexes only are bloated
  • Rebuild them…
  • … by building another index if needed
  • … concurrently if needed
logo EDB

Backups

  • RTO
  • Test!
  • Replicas + physical backups
  • Atomic snapshots
  • Snapshots with DB in backup mode

→ Whatever you do, test as much as you can!

logo EDB

Thank you!

Do you have any question?

Image by Myriams-Fotos from Pixabay
logo EDB